Skip to content

Fixed issues #107 and #123, receiver check connection and when pop msg fail… #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

yonker-yk
Copy link

Fixed issues #107 and #123. The receiver now checks the connection status when popping a message fails, and calls the reconnect function if the connection check indicates that the connection is down. The root cause of the receiver's failure to pop a message is that the sender process invokes force_push() when it fails to push data into the circular queue. The force_push() function then identifies which receiver has not read the data and disconnects it. Typically, the receiver process hasn't read the data because it is either blocked (e.g., paused at a breakpoint) or stuck (when the system CPU load is high). Consequently, the receiver process has no way of knowing whether the connection was disconnected and will either wait on ::SignalObjectAndWait (when there is only one receiver in the connection) or endlessly retry popping the message.

Therefore, the solution involves making the receiver aware of disconnections so it can take action to recover, such as calling the reconnect() function. To achieve this, I have optimized the connect function and added reconnection logic in the recv() function. This approach has proven effective.

…d, and call reconnect function when the connection check result is false
@yonker-yk yonker-yk changed the title Fixed issue 107 and 123, receiver check connection and when pop msg fail… Fixed issues #107 and #123, receiver check connection and when pop msg fail… May 9, 2025
@mutouyun mutouyun mentioned this pull request May 10, 2025
@mutouyun
Copy link
Owner

mutouyun commented May 10, 2025

非常感谢你的pr!但提交里有一些小问题。connected函数默认使用位运算判断是否连接,是不准确的,在unicast模式下连接标记仅用于计数,因此完全使用connected_ & elems->connections()来判断会出现错误。我在你的提交的基础上做了一些修改,并提交了一个新的pr。见:#148

Thank you very much for your pr! However, there are minor issues in the commit. The connected function defaults to using bitwise operations to determine connectivity, which is inaccurate. In unicast mode, the connection flags are used solely for counting, thus fully relying on connected_ & elems->connections() for judgment will cause errors. I made some modifications based on this and submitted a new pr. See: #148

@mutouyun mutouyun closed this May 10, 2025
@yonker-yk
Copy link
Author

感谢作者指正,由于我的项目只使用了 NvN broadcast 模式,没有考虑到 unicast 模式的问题,我昨天看到你的单元测试检测出了这个异常 case,非常强!你的修改我同步下来验证了,没有问题哈,学习了,这个项目非常棒,再次感谢!

Thank you, author, for pointing out the problem. Since I only used the NvN broadcast mode in my project and didn't consider the issues of the unicast mode, I saw that your unit test detected this exceptional case yesterday. It's really impressive! I synchronized and verified your modification, and there are no problems. I've learned a lot. This project is excellent. Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants